Machine Learning Techniques to Predict Software Defect
نویسندگان
چکیده
Machine learning techniques have been dominating in the last two decades. The recently published comprehensive state-of-the-art review (Mohanty et al., 2010) justifies this issue. The ability of software quality models to accurately identify critical faulty components allows for the application of focused verification activities ranging from manual inspection to automated formal analysis methods. Therefore, software quality models to ensure the reliability of the delivered products. Accurate prediction of fault prone modules enables the verification and validation activities that includes quality models: Musa, 1998, logistic regression (Basili et al., 1996), discriminant analysis (Khoshgoftaar, 1996), the discriminant power techniques (Schneidewind, 1992), artificial neural network (Khoshgoftaar, 1995), genetic algorithm (Azar et al., 2002), and classification trees (Gokhale et al., 1997; Khoshgoftar et al., 2002; Selby et al., 1988; Fenton et al., 1999). A wide range of modeling techniques has been proposed and applied for software quality predictions. These include: proposed the Bayesian belief network as the most effective model to predict software quality. Classification is a popular approach to predict software defects and involves categorizing modules, which is represented by a set of metrics or code attributes into fault prone (fp) non fault prone (nfp) by means of a classification model derived from data (Lessman et al., 2008), statistical methods (Basili et al., 1996; Khoshgoftar & Allen, 1999), tree based methods, (Guo et al., 2004; Khoshgoftar et al., 2000; Menzies et al., 2004; Porter et al., 1990; Selby et al., 1988), neural networks (Khoshgoftar et al., 1995, 1997) and analogy based approaches (El-Emam et al., 2001; Ganeshan et al., 2000; Khoshgoftar et al., 2003), Decision tree (Selby et al., 1988). The discriminative power techniques correctly classified 75 out of 81 fault free modules, and 21 out of 31 faulty modules (Porter et al., 1992). Lessmann et al., (2008) used 10 software development datasets from NASA MDP repository to predict software defects. Most recently, Pendharkar (2010) used the same dataset to test the efficacy of their hybrid exhaustive search and probabilistic neural network (PNN), and simulated annealing (SA) method. In this chapter, we present a software defect prediction methodology based on GP, BPNN, GMDH, PNN, GRNN, TreeNet, CART, Random Forest Naïve Baye’s and J48 on the DATATRIEVE, PC1, PC3, PC4, MC1, KC1, KC2, KC3, CM1 and JM1 datasets. The rest of the chapter is organized in the following manner. A brief discussion about the overview of machine learning techniques is presented in section 2. Section 3 describes the experimental methodology. Section 4 presents a detailed discussion of the results and discussions. Finally, section 5 concludes the chapter. Ramakanta Mohanty Keshav Memorial Institute of Technology, India
منابع مشابه
Predicting Software Defects through SVM: An Empirical Approach
Software defect prediction is an important aspect of preventive maintenance of a software. Many techniques have been employed to improve software quality through defect prediction. This paper introduces an approach of defect prediction through a machine learning algorithm, support vector machines (SVM), by using the code smells as the factor. Smell prediction model based on support vector machi...
متن کاملComparison of local classifiers for cross-project defect prediction
There is a connection between static source code metrics, for example, lines of code or cyclomatic complexity and potential defects in the source code. Obviously, there is no closed formula, but with the field of machine learning and its techniques we have a tool at our disposal that has the ability to infer rules from large amounts of data. In this thesis, we use machine learning techniques to...
متن کاملComparative Analysis of Machine Learning Algorithms with Optimization Purposes
The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches. Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data. In this paper, a methodology has been employed to opt...
متن کاملEstimating Handling Time of Software Defects
The problem of accurately predicting handling time for software defects is of great practical importance. However, it is difficult to suggest a practical generic algorithm for such estimates, due in part to the limited information available when opening a defect and the lack of a uniform standard for defect structure. We suggest an algorithm to address these challenges that is implementable ove...
متن کاملBenchmarking Machine Learning Technologies for Software Defect Detection
Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A predictive model is constructed by using machine learning approaches and classified them into defective and non-defective modules. Machine learning techniques hel...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کامل